{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parallel computing with Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nowadays, most computers have more than one physical CPU such that they can compute things in parallel. Even your mobile phones have multiple cores. In an ideal world, $N$ CPUs allow for a speed-up of a factor of $N$ of (certain) computations.\n",
    "\n",
    "In big computing clusters, certain computations can even run on hundreds to thousands of CPUs simultaneously. For example, the [Illustris TNG](http://www.tng-project.org/) simulations of structure formation in the Cosmos (you already worked with data of the predecessor, the Illustris project) utilised up to 24,000 cores over nearly 2 years (in total almost 150 million CPU hours), to finish one of the largest scientific computations ever. The simulations run on the [Hazel Hen Supercomputer](https://www.hlrs.de/systems/cray-xc40-hazel-hen/) in Stuttgart and used the Arepo code that has been developed by Volker Springel and his group (now at MPA in Garching, before here in Heidelberg at the Centre for Astronomy).\n",
    "\n",
    "Of course, we will not dive into such high-performance computing here, but Python has modules that allow us to parallelise certain computations. in particular, we will now look into the ``multiprocessing`` [package](https://docs.python.org/3.4/library/multiprocessing.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import multiprocessing as mp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all, how many cores does our machine actually have? To this end, we can use the ``cpu_count()`` function of ``multiprocessing``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Number of processors:', mp.cpu_count())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So this is the maximum number of parallel processes our machine can handle. Note however, the above CPU count may also includes logical CPUs if [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) is available and enabled. The ``psutil`` package can distinguish and delivers the number of physical cores. Hyper-threading may further boost the computations but that depends on the jobs. Sometimes, it can even slow computations down."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import psutil\n",
    "\n",
    "n_cpus = psutil.cpu_count(logical = False)\n",
    "print('The number of physical cores is:', n_cpus)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A simple example\n",
    "\n",
    "First, we will use the ``Pool`` object of ``multiprocessing`` to parallelise the computation of the cubes of a set of numbers. Such situations are trivially to parallelise because we can simply divide the whole workload into smaller chunks and assign the chunks to the different CPUs of our machine.\n",
    "\n",
    "We will now implement one solution that uses the ``Pool.apply_async()`` functionality of ``Pool``. Alternative methods are, e.g., ``Pool.apply()`` and ``Pool.map()`` and there exist more. We won't have time to go through the differences of these methods here, so if you want to learn more, we recommend to have a look at the ``multiprocessing`` [documentation](https://docs.python.org/3.4/library/multiprocessing.html).\n",
    "\n",
    "First of all, we have to define a function that will be executed by the different CPUs. Here, this functions takes a number and returns the original number and its cube (we have added some \"sleep\" time here because the computation itself does not take that long...):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "def cube(x):\n",
    "    time.sleep(2/x)\n",
    "    return (x, x**3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now set up a ``Pool`` object that consists of ``processes=2`` workers. Because we know how many CPUs our machine has, we could also have used ``pool = mp.Pool(processes=n_cpus)``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pool = mp.Pool(processes=2)\n",
    "\n",
    "results = []\n",
    "for x in range(1, 11):\n",
    "    results.append(pool.apply_async(cube, args=(x,)))\n",
    "    \n",
    "for p in results:\n",
    "    x, res = p.get()\n",
    "    print(x, res)\n",
    "    \n",
    "pool.close()\n",
    "pool.join()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So how does this code work? As mentioned above, we have set up a pool of 2 workers via ``pool = mp.Pool(processes=2)`` to which we now can send computation jobs. The jobs consist of evaluating the function ``cube`` given the function arguments ``args=(x,)`` and append the return-values of ``cube`` to the ``results`` list. This is done asynchronously here, i.e. whenever a result is ready, it can be returned. The order of the job execution does not matter in our case.\n",
    "\n",
    "The results of the workers itself are returned with the ``get()`` function. The results come in whenever they are ready such that the order of job submission and results must not be the same.\n",
    "\n",
    "After requesting the results, we also close the pool with ``pool.close()`` such that no new jobs can be submitted to our workers. Also, we call ``pool.join()`` which essentially tells Python to wait until all jobs are done."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have seen that it is not that difficult to parallelise simple tasks. We have to be more careful when programming the code and the code to write will be longer, but we are rewarded with faster computations that fully utilise the capabilities of modern computers.\n",
    "\n",
    "During daily scientific work, speeding up certain computations can help a lot. Instead of waiting for hours for one result, one might be able to obtain it within minutes!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A slightly more complicated example - decoding ``md5``\n",
    "\n",
    "Let us now try to \"crack\" a ``md5`` hash that is often used to store passwords. Note that you should never use ``md5`` to encrypt passwords for storage - it is extremely weak as we will see below. \n",
    "\n",
    "In our example, we only consider strings of up to 4 letters that contain only lower-case, alphanumerical characters. The ``md5`` hash to crack is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hash_to_crack = '31d4541b8e926a24f0c9b835b68cfdf3'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use a brute-force method that will illustrate why parallel computing can be efficient. To this end, we will create all possible 4-character long strings, encrypt them with ``md5`` and compare the resulting hash to ``hash_to_crack`` until we found the correct match.\n",
    "\n",
    "To speed up the computations will launch a set of workers that will then do the encryption and comparison in parallel. This is a trivial parallelisation because the individual workers do not need to communicate with each other. Such simple jobs can also be parallelised efficiently on GPUs instead of CPUs but we will not go into this here.\n",
    "\n",
    "For starters, let's see how to generate a ``md5`` hash in Python. This is best done with the ``hashlib`` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import hashlib\n",
    "\n",
    "def get_md5_hash(s):\n",
    "    return hashlib.md5(s.encode('utf-8')).hexdigest()\n",
    "\n",
    "print(get_md5_hash('test'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now define the allowed characters of the string, which are lower-case, alphanumeric letters using the ``ord`` and ``chr`` functions that you encountered on Day 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "alphabet = [''] # add empty string (also ensures we probe all strings up to length 4)\n",
    "for i in range(ord('a'), ord('z')+1):\n",
    "    alphabet.append(chr(i))\n",
    "\n",
    "print(alphabet)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(Note that we have added an empty char to the list. Why?)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now use different methods of the ``multiprocessing`` package than in the simple example above, namely the ``Manager`` and ``Queue`` objects. This is to illustrate more clearly how one can think of parallelisation, and also to show different capabilities of ``multiprocessing``."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our strategy is to launch a ``Manager`` that divides the work into individual jobs and sends them to ``Workers``. As before, we have to define what each ``Worker`` has to do:\n",
    "\n",
    "    def do_work(in_queue, out_list):\n",
    "        while (True):\n",
    "            s = in_queue.get()\n",
    "\n",
    "            # capture exit signal\n",
    "            if (s == None):\n",
    "                return\n",
    "\n",
    "            # do work\n",
    "            h_s = get_md5_hash(s)\n",
    "            if (h_s == hash_to_crack):\n",
    "                out_list.append(s)\n",
    "\n",
    "Here, the ``Workers`` take a string from a list called ``in_queue`` provided by the ``Manager``, compute its ``md5`` hash, compare it to the hash we want to decode and report to the ``out_list`` if a matching string is found. Also, the ``Manager`` can send a signal to the ``Workers`` that work is over and that they \"can go home\" (here, this happens if the string retrieved from ``in_queue`` is ``None``, but we could also have chosen ``Feierabend``...). Moreover, once one worker has found a match, all ``Workers`` have done their job and can stop (that is when the ``out_list`` is no longer empty because we have found one solution; note that any objects appended to ``out_list`` by one ``Worker`` are automatically also available to all other ``Workers``)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have to initialise the ``Manager``, the ``Workers`` and let the ``Manager`` distribute the work, just as in the first parallelisation example above. We will also compute the time it takes the program to decode the hash via the ``time.time()`` function of the ``time`` package. On Unix systems, this function returns the time in seconds since January 1, 1970, 00:00:00 (UTC). Other operation systems use different reference points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import product\n",
    "\n",
    "def do_work(in_queue, out_list):\n",
    "    while (True):\n",
    "        s = in_queue.get()\n",
    "\n",
    "        # capture exit signal\n",
    "        if (s == None):\n",
    "            return\n",
    "\n",
    "        # do work\n",
    "        time.sleep(0.0005)\n",
    "        h_s = get_md5_hash(s)\n",
    "        if (h_s == hash_to_crack):\n",
    "            out_list.append(s)\n",
    "            \n",
    "start = time.time()\n",
    "\n",
    "num_workers = 4\n",
    "\n",
    "manager = mp.Manager()\n",
    "results = manager.list()\n",
    "work = manager.Queue(num_workers)\n",
    "\n",
    "# initialise pool of workers    \n",
    "pool = []\n",
    "for i in range(num_workers):\n",
    "    p = mp.Process(target=do_work, args=(work, results))\n",
    "    p.start()\n",
    "    pool.append(p)\n",
    "\n",
    "# split work\n",
    "for c1, c2, c3, c4 in product(alphabet, repeat=4):\n",
    "    s = c1+c2+c3+c4\n",
    "    work.put(s)\n",
    "\n",
    "    # stop sending jobs if a solution was found\n",
    "    if (results):\n",
    "        break\n",
    "\n",
    "# send workers the exit signal\n",
    "for i in range(num_workers):\n",
    "    work.put(None)\n",
    "\n",
    "# wait for all workers to finish\n",
    "for p in pool:\n",
    "    p.join()\n",
    "    \n",
    "end = time.time()\n",
    "\n",
    "print('The computation took {:.2f} seconds'.format(end-start))\n",
    "\n",
    "if (results):\n",
    "    print('We found the plain text! It is \"{:s}\". Wow, this was indeed fast!\".'.format(results[0]))\n",
    "else:\n",
    "    print('We could not find a solution. :(')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise 1:** Go through the code line-by-line and try to understand what it does. Why did we add the empty character ``''`` to ``alphabet`` above? How could you extend the code such that also hashes originally generated from strings containing upper-case letters, numbers and special symbols can be decoded?\n",
    "\n",
    "*Hint:* In the above code, we used the ``itertools.product`` method to make our code more readable. To understand better what this loop structure does, we could have written it also as:\n",
    "\n",
    "    for c1 in alphabet:\n",
    "        for c2 in alphabet:\n",
    "            for c3 in alphabet:\n",
    "                for c4 in alphabet:\n",
    "                    s = c1+c2+c3+c4\n",
    "                    ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It should also be clear from this example that the best passwords are actually **long** passwords. In fact, it does not matter so much how many different characters you use (e.g. in ``alphabet`` above) but the length is much more important because this means brute-force methods have to check too many possible combinations which at some point is no longer feasible.\n",
    "\n",
    "It should be pointed out that the above code is not very efficient. We are merely able to test 10,000 hashes per second on 2 CPUs. Still, grouped with a good dictionary instead of the random generation of strings, we could test all words of a standard English dictionary (~200,000 words) in about 20 seconds. Optimised codes on GPUs can probe a few billion hashes per second..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Final remark:** Just type the ``hash_to_crack`` into Google and see what happens... If you have not yet been convinced, you should be now: ``md5`` is a really bad idea to encrypt sensitive information for storage (such as passwords). The methods is (a) cryptographically no longer secure and (b) way too fast such that brute-force attacks are possible. [Salting](https://en.wikipedia.org/wiki/Salt_(cryptography)) certainly helps but one should nevertheless use \"slower\" hash functions such as ``bcrypt`` to store passwords (and always at least use a salt!)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}